Natural Language Processing on Heroku
Notes on the process of turning a locally experimented natural language processing algorithm into an API server on Heroku
I'll refer to my notes Heroku+Flask when I did something similar before. Create a new working directory
I've thought about using existing repositories, but it's so complicated that it would take a long time to find the cause of a problem and be barren, so I'll try to keep it as simple as possible.
$ mkdir regroup-split-server
$ cd regroup-split-server
Create a virtual environment and work on VSCode
$ python3 -m venv venv
$ code .
View -> Terminal
$ source venv/bin/activate
Create a minimal server with Flask
$ mkdir server
$ code server/__init__.py
code:python
from flask import Flask
app = Flask(__name__)
def create_app():
return app
@app.route('/')
def root():
return "OK"
$ pip install --upgrade pip
$ pip install flask
Set environment variables in a file
$ code .env
code:.env
FLASK_APP=server
FLASK_ENV=development
$ pip install python-dotenv
$ flask run
$ git init
$ code .gitignore
code:.gitignore
venv/
*.pyc
__pycache__/
$ git commit -m 'minimal Flask server'
Actually, I'm doing Cmd+Enter in the Source Control tab of VSCode.
This also serves as HTTPS for Flask, which is included in the minimum configuration because I think it is not possible to have only HTTP as a modern API server. $ pip install gunicorn
$ pip freeze > requirements.txt
$ code Procfile
code:Procfile
web: gunicorn server:"create_app()"
$ heroku create regroup-split-server
$ git commit -m "add gunicorn"
$ git push --set-upstream heroku master
Build logs appear. Make sure it's not an error.
$ heroku open
Open the deployed one in a browser, making sure OK is displayed.
I haven't figured out yet what to do with this deployment repository and what to do with the local R&D repository to keep it clean.
I'm sure I'll want to separate them under certain circumstances, but until I have a clearer idea of how I want to separate them, I'm going to do it in unison.
I've had a hard link with separate repositories, but I don't think it's a good idea.
I guess I'd better use a git submodule or pip to connect them.
Keep folders separate for easy separation in the future.
$ mkdir server/regroup_split
Copy files that look necessary
code:deploy.sh
cp rich_tokenizer.py ../regroup-split-server/server/regroup_split/
cp regroup_split.py ../regroup-split-server/server/regroup_split/
cp TAIL_TOKENS_TO_REMOVE.txt ../regroup-split-server/server/regroup_split/
cp HEAD_TOKENS_TO_REMOVE.txt ../regroup-split-server/server/regroup_split/
cp test/simplelines1.txt ../regroup-split-server/server/regroup_split/test
cp test/regression_test.json ../regroup-split-server/server/regroup_split/test
Run unit tests and check for errors.
ModuleNotFoundError: No module named 'MeCab'
$ pip install mecab
$ pip install mecab-python3==0.996.5
If the unit test passes, call the test from server/__init__.py
flask run to see if the test works on the local development server
It's easier to read error messages on the local development server than after deployment.
Common Corrections
Relative import from .foo import bar.
I usually run it as a script and experiment with it, but it is imported from the server and run as a module, so the import behavior changes.
path of a data file
If you're writing in a way that depends on the current directory at runtime, you'll get into trouble here.
Use DIR = os.path.dirname(__file__).
Push to heroku when it works locally
$ pip freeze > requirements.txt
Don't forget to ADD and COMMIT!
I guess I should have done that when I INSTALLED it.
$ git push
After successful build, heroku open with 500 error
View runtime logs
$ heroku logs --tail
TypeError: 'dict_keys' object is not reversible
Python on heroku is 3.6 by default
Align the executed version with the one at hand
$ echo python-3.8.7 > runtime.txt
Test cases now work on heroku as well.
Add an interface to return processed values passed from the server to the experimental scripts that have been running on the terminal and observing the results on the standard output.
In this case, it takes a string and returns a list of token strings.
At this point, it is up to you to decide whether you want to return a rich object or one that can be serialized in json.
By itself, it depends on the application.
I think the process of making json serializable is something that is required everywhere, so I think it would be good to have it on the library side.
Proper serialization can change as internal structures change, and
code:python
def process_single_line(line):
tokens = tokenize(line)
calc_split_priority(tokens)
return dict(
tokens=concat_tokens(tokens, " "),
GET
code:python
@app.route('/api/', methods='GET') def api():
ret = regroup_split.process_single_line(text)
return ret
/api/?q=... Pass to GET to check operation with
Automatically serialized in JSON
POST
code:python
def api():
if request.method == "GET":
else:
ret = regroup_split.process_single_line(text)
return ret
$ curl -X POST -H "Content-Type: application/json" -d '{"q":"test"}' localhost:5000/api/
operation check
git push to make sure it works on heroku as well
Create a client side that calls this API
code:python
import requests
import json
sample_text = "Ah, so people who are not used to the process of making lots of stickies and doing the KJ method don't have a good idea of how granular the information should be at the point of making the stickies in the first place. That's where the software needs to help."
payload = {"q": sample_text}
r = requests.post(API_URL, json=payload)
assert r.ok
print(s)
"""
Expected output:
Make lots of sticky notes.
People unfamiliar with the process of doing the KJ method.
How granular is the information at the point where you make a sticky note?
I can't pinpoint a good one.
Software needs to support
"""
Call from JS
---
---
This page is auto-translated from /nishio/Herokuで自然言語処理. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.